My 2nd year of grad school I took a fantastic course called Computational Methods in Population Biology (ECL 233), taught by my one of my co-advisors, Sebastian Schreiber and one of my QE committee members, Marissa Baskett. The course was pretty evenly split between ecologists and applied math students, and focused on pretty applied mathematical modeling concepts. As a quantitative ecologist with relatively sparse formal mathematical training, but pretty solid computational/R skills, this course was incredible. Being able to implement models in R meant I could poke and prod the models, changing parameters or investigating intermediate values. This helped me translate the more formal mathematical models into data I could work with like any other. Richard McElreath notes this pedagogical benefit in posterior distribution sampling in Bayesian statistics in his fantastic book, Statistical Rethinking1 I’m not usually one to fawn over academic stuff, but Statistical Rethinking absolutely changed the way I think. Not only did it give me a deep appreciation and intuition for Bayesian data analysis, but it is an absolute pedagogical marvel. If you haven’t read it yet, run, don’t walk, to grab a copy..
Back when I took ECL 233, I was a relatively confident R user, but
nowhere near as competent as I am now. In particular, I’ve come to
embrace a tidier approach to working with data, trying to
keep things in dataframes/tibbles as much as possible. This has been
remarkably slick while simulating data for the sake of testing out
statistical models or more formal simulation-based calibration (perhaps
the topic of a later blog post). You set up parameters in their own
columns, so each row has a full set of parameters necessary to generate
data, then you store the resulting data in a list-column.
In discrete-time population models, you can’t vectorize everything
since the population size is calculated from the population size at the
last time step, so you’ve gotta use a for loop at some
point. What this often means is that population simulations are stored
in matrices or arrays; for example, you might have a matrix where each
column corresponds to an r value for your model, and each
row corresponds to a time step. You then use a for loop to
generate the time series for each r value’s column. R is
pretty nice for working with matrices and arrays, and they’ll often be
the fastest/most efficient way to implement big ole models. But in some
cases, it would be really nice to be able to use a more
tidy approach, for the sake of plotting and organization.
It can be really nice to have all your parameters neatly associated with
all your simulated outcomes in a single tibble. I hate having a bunch of
disparate vectors and matrices floating around for a single model, so
this approach really appeals to me.
To demonstrate this approach, I reformatted one of my ECL 233 homework assignments looking at the Ricker model2 Ricker’s original paper and its ability to generate C H A O S. I’ll show how a tidy approach makes it easy to look at different parameter values, plot our results, and calculate various things about our model.
Rick Astley’s fashion sense is like a
classic theoretical paper: timeless. Seriously, get a load of that black
turtleneck and trousers under a stone trench coat…
ggplot themelibrary(tidyverse)
library(patchwork)
my_theme <- theme_minimal() + theme(panel.grid = element_blank())
First we write a function that calculates an update of the Ricker model, then a function that takes parameter inputs, a starting value, and a number of iterations, and generates values from the Ricker model through time.
RickRoll <- function(nt, r, k){
nt * exp(r * (1-nt / k))
}
RickRollIter <- function(r, k, n0, tf){
data <- rep(n0, tf)
for (t in 1:(tf-1)) {
data[t+1] <- RickRoll(nt = data[t], r = r, k = k)
}
return(tibble(results = data,
timestep = 1:tf))
}
Next, we’ll set up our basic simulation table, which holds parameters
in a nice tidy format. Here we use the same values for k,
n0, and tf, but we look at a few different
values of r.
sim <- tibble(k = 100,
n0 = 50,
tf = 80,
r = c(1.5,2.3,2.6,3.0))
Next, we use the pmap() function to operate on each row
of our tibble individually. We take the parameter inputs and use the
RickRollIter() function to generate a time series, which
gets stored in a list-column.
sim <- sim %>%
mutate(results = pmap(.l = list(r, k, n0, tf), .f = RickRollIter))
sim
## # A tibble: 4 × 5
## k n0 tf r results
## <dbl> <dbl> <dbl> <dbl> <list>
## 1 100 50 80 1.5 <tibble [80 × 2]>
## 2 100 50 80 2.3 <tibble [80 × 2]>
## 3 100 50 80 2.6 <tibble [80 × 2]>
## 4 100 50 80 3 <tibble [80 × 2]>
Finally, we can unnest() this column, then plot the
population size through time for each separate value of
r.
sim %>%
unnest(results) %>%
ggplot(aes(x = timestep, y = results)) +
geom_point() +
geom_line() +
facet_wrap(vars(r)) +
ylab("Pop size") +
my_theme
With this approach we can compare the effects of r and
k separately with only a few more lines of code. We use
expand_grid() to generate every unique combination of the
r and k values we supply, then add our single
values of n0 and tf. The rest of the code
proceeds as before.
expand_grid(k = c(100, 200, 500),
r = c(1.5, 2.3, 3.0)) %>%
mutate(n0 = 50, tf = 80) %>%
rowwise() %>%
mutate(results = list(
RickRollIter(r = r, k = k, n0 = n0, tf = tf))) %>%
unnest(results) %>%
ggplot(aes(x = timestep, y = results)) +
geom_point() +
geom_line() +
facet_grid(rows = vars(r), cols = vars(k)) +
ylab("Pop size") +
my_theme
Now let’s generate a classic logistic map bifurcation diagram, which
shows how a discrete deterministic model can generate C H A O
S as the value of r increases. We do almost the
same thing as before, but we need a much finer grid of r
values to make a nice plot.
sim_bif <- tibble(k = 100,
n0 = 50,
tf = 1000,
r = seq(from=1.5,to=3.6,by=0.001)) %>%
mutate(results = pmap(.l = list(r, k, n0, tf), .f = RickRollIter))
Now, instead of plotting population sizes across time, we plot
population sizes by r value. Because the Ricker model needs
some time to settle in on a stable population size, we’ll only plot the
last 40 time steps for each r value.
sim_bif %>%
unnest(results) %>%
filter(timestep > max(timestep)-39) %>%
ggplot(aes(x = r, y = results)) +
geom_point(alpha = 0.3, size = 0.1) +
ylab("Pop size") +
my_theme
Now with a little color, according to the number of values.
sim_bif %>%
unnest(results) %>%
filter(timestep > max(timestep)-39) %>%
group_by(r) %>%
mutate(results_round = round(results, 2),
n_vals = n_distinct(results_round)) %>%
ungroup() %>%
ggplot(aes(x = r, y = results, color = n_vals)) +
geom_point(alpha = 0.3, size = 0.5) +
my_theme +
theme(legend.position = c(0.15, 0.1), legend.direction = "horizontal") +
ylab("Pop size") +
colorspace::scale_color_continuous_sequential(
palette = "Purple-Yellow")